Constructing Treatment Portfolios Using Affinity Propagation

نویسندگان

  • Delbert Dueck
  • Brendan J. Frey
  • Nebojsa Jojic
  • Vladimir Jojic
  • Guri Giaever
  • Andrew Emili
  • Gabe Musso
  • Robert Hegele
چکیده

A key problem of interest to biologists and medical researchers is the selection of a subset of queries or treatments that provide maximum utility for a population of targets. For example, when studying how gene deletion mutants respond to each of thousands of drugs, it is desirable to identify a small subset of genes that nearly uniquely define a drug ‘footprint’ that provides maximum predictability about the organism’s response to the drugs. As another example, when designing a cocktail of HIV genome sequences to be used as a vaccine, it is desirable to identify a small number of sequences that provide maximum immunological protection to a specified population of recipients. We refer to this task as ‘treatment portfolio design’ and formalize it as a facility location problem. Finding a treatment portfolio is NP-hard in the size of portfolio and number of targets, but a variety of greedy algorithms can be applied. We introduce a new algorithm for treatment portfolio design based on similar insights that made the recently-published affinity propagation algorithm work quite well for clustering tasks. We demonstrate this method using the two problems described above: selecting a subset of yeast genes that act as a drug-response footprint, and selecting a subset of vaccine sequences that provide maximum epitope coverage for an HIV genome population. 1 Treatment Portfolio Design (TPD) A central question for any computational research collaborating with a biologist or medical researcher is in what form computational analyses should be handed over to the experimentalist or clinician. While application-specific predictions are often most appropriate, we have found that in many cases what is needed is a selection of potential options available to the biologist/medical researcher, so as to maximize the amount of information gleaned from an experiment, which often can be viewed as consisting of independently assayed targets. If the number of options is not too large, these can be discussed and selected by hand. On the other hand, if the number of possibilities is large, a computational approach may be needed to select the appropriate options. This paper describes the framework and approaches that emerged while trying to address problems of this type with our collaborators. In particular, we show how the affinity propagation algorithm [1] can be used to effectively to approach this task. M. Vingron and L. Wong (Eds.): RECOMB 2008, LNBI 4955, pp. 360–371, 2008. c © Springer-Verlag Berlin Heidelberg 2008 Constructing Treatment Portfolios Using Affinity Propagation 361 For concreteness, we will refer to the possible set of options as ‘treatments’ and the assays used to measure the suitability of the treatments as ‘targets’. Each treatment has a utility for each target and the goal of what we will refer to as treatment portfolio design (TPD) is to select a subset of treatments (the portfolio) so as to maximize the net utility of the targets. The terms ‘treatment’, ‘target’ and ‘utility’ can take on quite different meanings, depending on the application. Treatments might correspond to queries, probes or experimental procedures, while targets might correspond to disease conditions, genes or DNA binding events. Example 1: The treatments are a set of potential yeast gene deletion strains used to query drug response, the targets are all ∼6000 yeast gene deletion strains, the utility is the number of gene-drug interactions in all strains that are predicted by the selected portfolio of strains. Example 2: The treatments are a large set of potential vaccines derived from HIV genomes, the targets are a population of HIV epitopes likely to be present in a demographic with high infection risk, the utility is the level of immunological protection, i.e., number of epitopes present in the selected portfolio of HIV vaccines. Example 3: The treatments are a set of baseline demographic, anthropometric, biochemical and DNA SNP variables thought to be predictive of cardiovascular endpoints and postulated to form a clinical set of risk factors, the targets are∼4,000,000 disease end-point targets comprising ∼20,000 patients and ∼200 conditions, the utility is the predictability of disease end-points, including risk. Example 4: The treatments are a set of laboratory procedures used to synthesize biologically active compounds, the targets are a list of desired compounds to be synthesized, the utility is the negative financial cost needed to synthesize all target compounds using the selected portfolio of laboratory procedures. Example 5: The treatments are a large set of microRNAs potentially involved in regulating the expression of disease-associated genes, the targets are a list of genedisease pairs, the utility is the net corrected correlation between gene expression and expression of microRNAs in portfolio for all disease conditions. The input to TPD is a set of potential treatments or queries T , a representative population of targets R and a utility function u : T ×R → R, where u(T,R) is the utility of applying treatment T ∈ T to target R ∈ R. This utility may be based on a variety of factors, including the benefit of the treatment, the cost, the time to application, the time to response, the estimated risk, etc. The goal of computational TPD is to select a subset of treatments P ⊆ T (called the ‘portfolio’) so as to maximize their net utility for the target population. A defining aspect of the utility function is that it is additive; for portfolio P , the net utility is ∑

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

BACKGROUNDS Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, ...

متن کامل

Learning Affinity via Spatial Propagation Networks

In this paper, we propose spatial propagation networks for learning the affinity matrix for vision tasks. We show that by constructing a row/column linear propagation model, the spatially varying transformation matrix exactly constitutes an affinity matrix that models dense, global pairwise relationships of an image. Specifically, we develop a three-way connection for the linear propagation mod...

متن کامل

A Particle Swarm Optimisation Approach in the Construction of Optimal Risky Portfolios

In this paper, we apply particle swarm optimisation to the construction of optimal risky portfolios for financial investments. Constructing an optimal risky portfolio is a high-dimensional constrained optimisation problem where financial investors look for an optimal combination of their investments among different financial assets with the aim of achieving a maximum reward-to-variability ratio...

متن کامل

A New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms

Breast cancer has become a widespread disease around the world in young women. Expert systems, developed by data mining techniques, are valuable tools in diagnosis of breast cancer and can help physicians for decision making process. This paper presents a new hybrid data mining approach to classify two groups of breast cancer patients (malignant and benign). The proposed approach, AP-AMBFA, con...

متن کامل

Creating More Stable and Diversified Socially Responsible Investment Portfolios

This study is the first to apply a robust estimation technique when constructing Socially Responsible Investing (SRI) portfolios and to highlight that the selection of the optimisation process in this industry matters. We go beyond the mean-variance Markowitz framework in order to bypass issues surrounding the significant estimation risk that causes unstable, poorly diversified and suboptimal p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008